Search CORE

24 research outputs found

New Multithreaded Hybrid CPU/GPU Approach to Hartree−Fock

Author: Andrey Asadchev
Asadchev A.
Buttari A.
Davidson E. R.
Furlani T. R.
Gordon M. S.
Ishimura K.
Janssen C. L.
Mark S. Gordon
Rys J.
Turney J. M.
Ufimtsev I. S.
Ufimtsev I. S.
Wilkinson K. A.
Yasuda K.
Publication venue: Iowa State University Digital Repository
Publication date: 01/09/2012
Field of study

In this article, a new multithreaded Hartree–Fock CPU/GPU method is presented which utilizes automatically generated code and modern C++ techniques to achieve a significant improvement in memory usage and computer time. In particular, the newly implemented Rys Quadrature and Fock Matrix algorithms, implemented as a stand-alone C++ library, with C and Fortran bindings, provides up to 40% improvement over the traditional Fortran Rys Quadrature. The C++ GPU HF code provides approximately a factor of 17.5 improvement over the corresponding C++ CPU code

Digital Repository @ Iowa State University (ISU)

Crossref

Distributed Memory, GPU Accelerated Fock Construction for Hybrid, Gaussian Basis Density Functional Theory

Author: Asadchev Andrey
Clark David
de Jong Wibe A.
Popovici Doru Thom
Valeev Edward F.
Waldrop Johnathan
Williams-Young David B.
Windus Theresa
Publication venue: 'AIP Publishing'
Publication date: 24/03/2023
Field of study

With the growing reliance of modern supercomputers on accelerator-based architectures such a GPUs, the development and optimization of electronic structure methods to exploit these massively parallel resources has become a recent priority. While significant strides have been made in the development of GPU accelerated, distributed memory algorithms for many-body (e.g. coupled-cluster) and spectral single-body (e.g. planewave, real-space and finite-element density functional theory [DFT]), the vast majority of GPU-accelerated Gaussian atomic orbital methods have focused on shared memory systems with only a handful of examples pursuing massive parallelism on distributed memory GPU architectures. In the present work, we present a set of distributed memory algorithms for the evaluation of the Coulomb and exact-exchange matrices for hybrid Kohn-Sham DFT with Gaussian basis sets via direct density-fitted (DF-J-Engine) and seminumerical (sn-K) methods, respectively. The absolute performance and strong scalability of the developed methods are demonstrated on systems ranging from a few hundred to over one thousand atoms using up to 128 NVIDIA A100 GPUs on the Perlmutter supercomputer.Comment: 45 pages, 9 figure

arXiv.org e-Print Archive

Computational Physics on Graphics Processing Units

Author: A. Asadchev
A. Castro
A. Harju
A. Harju
A. McAdams
A.G. Anderson
A.P. Lyubartsev
A.W. Götz
B.L. Tembre
C. Bonati
C. McNeile
C.M. Isborn
D.J. Hardy
E. Darve
G. Bhanot
G. Egri
G. Kresse
H.J. Rothe
I. Montvay
I. Samish
I. Ufimtsev
I.S. Ufimtsev
I.S. Ufimtsev
I.S. Ufimtsev
J. Enkovaara
J. Gao
J. Hubbard
J.A. Anderson
J.A. McCammon
J.E. Stone
J.S. Meredith
K. Esler
K. Moreland
K. Yasuda
K. Yasuda
L. Genovese
L. Genovese
L. Greengard
L. Gu
L. Ha
M. Bordag
M. Göckeler
M. Hasenbusch
M. Hutchinson
M. Macedonia
M.C. Gutzwiller
M.C. Payne
M.P. Allen
N. Cardoso
N. Goodnight
N. Luehr
N.A. Gumerov
P. Giannozzi
P. Kipfer
P. Petreczky
R. Parr
R.D. Mawhinney
R.D. Skeel
R.G. Belleman
S. Hakala
S. Ihnatsenka
S. Maintz
T. Shirakawa
T. Siro
T. Takahashi
T.W. Chiu
V. Rokhlin
V. Springel
W. Jia
W. Kohn
W.M.C. Foulkes
X. Andrade
Y. Aoki
Y. Chen
Z. Fodor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

arXiv.org e-Print Archive

Crossref

The grid-based fast multipole method - a massively parallel numerical scheme for calculating two-electron interaction energies

Author: Amdahl
Anderson
Asadchev
Bhaskaran-Nair
Bischoff
Bischoff
Boys
Carlson
Choi
Dachsel
Dage Sundholm
Elias A. Toivanen
Fischer
Friedrichs
García-Risueño
Greengard
Helgaker
Jensen
Jusélius
Kobus
Koga
Kussmann
Laaksonen
Levine
Losilla
Losilla
Losilla
Losilla
Maurer
Moore
Olivares-Amaya
Pérez-Jordá
Rudberg
Sergio A. Losilla
Singer
Steinborn
Stone
Sundholm
Titov
van Meel
Vogt
Watson
Weber
White
White
Wu
Yasuda
Yasuda
Publication venue
Publication date: 01/01/2015
Field of study

Algorithms and working expressions for a grid-based fast multipole method (GB-FMM) have been developed and implemented. The computational domain is divided into cubic subdomains, organized in a hierarchical tree. The contribution to the electrostatic interaction energies from pairs of neighboring subdomains is computed using numerical integration, whereas the contributions from further apart subdomains are obtained using multipole expansions. The multipole moments of the subdomains are obtained by numerical integration. Linear scaling is achieved by translating and summing the multipoles according to the tree structure, such that each subdomain interacts with a number of subdomains that are almost independent of the size of the system. To compute electrostatic interaction energies of neighboring subdomains, we employ an algorithm which performs efficiently on general purpose graphics processing units (GPGPU). Calculations using one CPU for the FMM part and 20 GPGPUs consisting of tens of thousands of execution threads for the numerical integration algorithm show the scalability and parallel performance of the scheme. For calculations on systems consisting of Gaussian functions (alpha = 1) distributed as fullerenes from C-20 to C-720, the total computation time and relative accuracy (ppb) are independent of the system size.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi

Author: Alistair P. Rendell
Asadchev A.
De Prince E. A.
Gotz A. W.
Mark S. Gordon
Sarom S. Leang
Shao Y.
Szabo A.
Ufimtsev I. S.
Valiev M.
Vogt L.
Yasuda K.
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Excited-State Electronic Structure with Configuration Interaction Singles and Tamm–Dancoff Time-Dependent Density Functional Theory on Graphical Processing Units

Author: Ahmadi G. R.
Almlof J.
Anderson J. A.
Appel H.
Asadchev A.
Badaeva E.
Becke A. D.
Becke A. D.
Brown P.
Burke K.
Case D. A.
Casida M. E.
Casida M. E.
Christine M. Isborn
Cohen A. J.
Cordova F.
Dallos M.
Davidson E. R.
Dion M.
Dreuw A.
Dreuw A.
Foresman J. B.
Friedrichs M. S.
Frisch M. J.
Grabo T.
Grimme S.
Grimme S.
Grimme S.
Gross E. K. U.
Hancock J. M.
Harpham M. R.
Heyd J.
Hirata S.
Hirata S.
Iikura H.
Ivan S. Ufimtsev
Jacquemin D.
Jorgensen W. L.
Kapasi U. J.
Ko C.
Kobayashi Y.
Krylov A. I.
Lebedev V. I.
Lee C.
Levine B. G.
Liu W.
Luehr N.
Maitra N. T.
Martinez T. J.
Martinez T. J.
Martinez T. J.
Martinez T. J.
Martinez T. J.
McMurchie L. E.
Murray C. W.
Nathan Luehr
Nielsen I. B.
Olivares-Amaya R.
Polli D.
Ramakrishna G.
Rohrdanz M. A.
Rohrdanz M. A.
Roos B. O.
Ruckenbauer M.
Runge E.
Schafer L.
Stanton J. F.
Stone J. E.
Tao J.
Tawada Y.
Todd J. Martínez
Tokita Y.
Ufimtsev I. S.
Ufimtsev I. S.
Ufimtsev I. S.
Ufimtsev I. S.
Valiev M.
Virshup A. M.
Vogt L.
Vydrov O. A.
Vysotskiy V. P.
Warshel A.
Whitten J. L.
Yamaguchi S.
Yasuda K.
Yasuda K.
Publication venue: American Chemical Society
Publication date
Field of study

Excited-state calculations are implemented in a development version of the GPU-based TeraChem software package using the configuration interaction singles (CIS) and adiabatic linear response Tamm–Dancoff time-dependent density functional theory (TDA-TDDFT) methods. The speedup of the CIS and TDDFT methods using GPU-based electron repulsion integrals and density functional quadrature integration allows full ab initio excited-state calculations on molecules of unprecedented size. CIS/6-31G and TD-BLYP/6-31G benchmark timings are presented for a range of systems, including four generations of oligothiophene dendrimers, photoactive yellow protein (PYP), and the PYP chromophore solvated with 900 quantum mechanical water molecules. The effects of double and single precision integration are discussed, and mixed precision GPU integration is shown to give extremely good numerical accuracy for both CIS and TDDFT excitation energies (excitation energies within 0.0005 eV of extended double precision CPU results)

Crossref

PubMed Central